<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
      Using the Twisted Web Client
    </title>
  </head>
  <body>
    <h1>
      Using the Twisted Web Client
    </h1>

    <h2>
      Overview
    </h2>

    <p>
      This document describes how to use the HTTP client included in Twisted
      Web.  After reading it, you should be able to make HTTP and HTTPS
      requests using Twisted Web.  You will be able to specify the request
      method, headers, and body and you will be able to retrieve the response
      code, headers, and body.
    </p>

    <p>
      A number of higher-level features are also explained, including proxying,
      automatic content encoding negotiation, and cookie handling.
    </p>

    <h3>
      Prerequisites
    </h3>

    <p>

      This document assumes that you are familiar with <a
      href="../../core/howto/defer.xhtml">Deferreds and Failures</a>, and <a
      href="../../core/howto/producers.xhtml">producers and consumers</a>.
      It also assumes you are familiar with the basic concepts of HTTP, such
      as requests and responses, methods, headers, and message bodies.  The
      HTTPS section of this document also assumes you are somewhat familiar with
      SSL and have read about <a href="../../core/howto/ssl.xhtml">using SSL in
      Twisted</a>.
    </p>

    <h2>
      The Agent
    </h2>

    <h3>Issuing Requests</h3>

    <p>
      The <code class="API">twisted.web.client.Agent</code> class is the entry
      point into the client API.  Requests are issued using the <code
      class="API" base="twisted.web.client.Agent">request</code> method, which
      takes as parameters a request method, a request URI, the request headers,
      and an object which can produce the request body (if there is to be one).
      The agent is responsible for connection setup.  Because of this, it
      requires a reactor as an argument to its initializer.  An example of
      creating an agent and issuing a request using it might look like this:
    </p>

    <a href="listings/client/request.py" class="py-listing">
      Issue a request with an Agent
    </a>

    <p>
      As may be obvious, this issues a new <em>GET</em> request for <em>/</em>
      to the web server on <code>example.com</code>.  <code>Agent</code> is
      responsible for resolving the hostname into an IP address and connecting
      to it on port 80 (for <em>HTTP</em> URIs), port 443 (for <em>HTTPS</em>
      URIs), or on the port number specified in the URI itself.  It is also
      responsible for cleaning up the connection afterwards.  This code sends
      a request which includes one custom header, <em>User-Agent</em>.  The
      last argument passed to <code>Agent.request</code> is <code>None</code>,
      though, so the request has no body.
    </p>

    <p>
      Sending a request which does include a body requires passing an object
      providing <code class="API">twisted.web.iweb.IBodyProducer</code>
      to <code>Agent.request</code>.  This interface extends the more general
      <code class="API" base="twisted.internet.interfaces">IPushProducer</code>
      by adding a new <code>length</code> attribute and adding several
      constraints to the way the producer and consumer interact.
    </p>

    <ul>
      <li>
        The length attribute must be a non-negative integer or the constant
        <code>twisted.web.iweb.UNKNOWN_LENGTH</code>.  If the length is known,
        it will be used to specify the value for the
        <em>Content-Length</em> header in the request.  If the length is
        unknown the attribute should be set to <code>UNKNOWN_LENGTH</code>.
        Since more servers support <em>Content-Length</em>, if a length can be
        provided it should be.
      </li>

      <li>
        An additional method is required on <code>IBodyProducer</code>
        implementations: <code>startProducing</code>.  This method is used to
        associate a consumer with the producer.  It should return a
        <code>Deferred</code> which fires when all data has been produced.
      </li>

      <li>
        <code>IBodyProducer</code> implementations should never call the
        consumer's <code>unregisterProducer</code> method.  Instead, when it
        has produced all of the data it is going to produce, it should only
        fire the <code>Deferred</code> returned by <code>startProducing</code>.
      </li>
    </ul>

    <p>
      For additional details about the requirements of <code class="API"
      base="twisted.web.iweb">IBodyProducer</code> implementations, see
      the API documentation.
    </p>

    <p>
      Here's a simple <code>IBodyProducer</code> implementation which
      writes an in-memory string to the consumer:
    </p>

    <a href="listings/client/stringprod.py" class="py-listing">
      A string-based body producer.
    </a>

    <p>
      This producer can be used to issue a request with a body:
    </p>

    <a href="listings/client/sendbody.py" class="py-listing">
      Issue a request with a body.
    </a>

    <p>
      If you want to upload a file or you just have some data in a string, you
      don't have to copy <code>StringProducer</code> though.  Instead, you can
      use <code class="API" base="twisted.web.client">FileBodyProducer</code>.
      This <code>IBodyProducer</code> implementation works with any file-like
      object (so use it with a <code>StringIO</code> if your upload data is
      already in memory as a string); the idea is the same
      as <code>StringProducer</code> from the previous example, but with a
      little extra code to only send data as fast as the server will take it.
    </p>

    <a href="listings/client/filesendbody.py" class="py-listing">
      Another way to issue a request with a body.
    </a>

    <p>
      <code>FileBodyProducer</code> closes the file when it no longer needs it.
    </p>

    <h3>
      Receiving Responses
    </h3>

    <p>
      So far, the examples have demonstrated how to issue a request.  However,
      they have ignored the response, except for showing that it is a
      <code>Deferred</code> which seems to fire when the response has been
      received.  Next we'll cover what that response is and how to interpret
      it.
    </p>

    <p>
      <code>Agent.request</code>, as with most <code>Deferred</code>-returning
      APIs, can return a <code>Deferred</code> which fires with a
      <code>Failure</code>.  If the request fails somehow, this will be
      reflected with a failure.  This may be due to a problem looking up the
      host IP address, or it may be because the HTTP server is not accepting
      connections, or it may be because of a problem parsing the response, or
      any other problem which arises which prevents the response from being
      received.  It does <em>not</em> include responses with an error status.
    </p>

    <p>
      If the request succeeds, though, the <code>Deferred</code> will fire with
      a <code class="API" base="twisted.web.client">Response</code>.  This
      happens as soon as all the response headers have been received.  It
      happens before any of the response body, if there is one, is processed.
      The <code>Response</code> object has several attributes giving the
      response information: its code, version, phrase, and headers, as well as
      the length of the body to expect.  The <code>Response</code> object also
      has a method which makes the response body available: <code class="API"
      base="twisted.web.client.Response">deliverBody</code>.  Using the
      attributes of the response object and this method, here's an example which
      displays part of the response to a request:
    </p>

    <a href="listings/client/response.py" class="py-listing">
      Inspect the response.
    </a>

    <p>
      The <code>BeginningPrinter</code> protocol in this example is passed to
      <code>Response.deliverBody</code> and the response body is then delivered
      to its <code>dataReceived</code> method as it arrives.  When the body has
      been completely delivered, the protocol's <code>connectionLost</code>
      method is called.  It is important to inspect the <code>Failure</code>
      passed to <code>connectionLost</code>.  If the response body has been
      completely received, the failure will wrap a <code
      class="API">twisted.web.client.ResponseDone</code> exception.  This
      indicates that it is <em>known</em> that all data has been received.  It
      is also possible for the failure to wrap a <code
      class="API">twisted.web.http.PotentialDataLoss</code> exception: this
      indicates that the server framed the response such that there is no way
      to know when the entire response body has been received.  Only
      HTTP/1.0 servers should behave this way.  Finally, it is possible for
      the exception to be of another type, indicating guaranteed data loss for
      some reason (a lost connection, a memory error, etc).
    </p>

    <p>
      Just as protocols associated with a TCP connection are given a transport,
      so will be a protocol passed to <code>deliverBody</code>.  Since it makes
      no sense to write more data to the connection at this stage of the
      request, though, the transport <em>only</em> provides <code class="API"
      base="twisted.internet.interfaces">IPushProducer</code>.  This allows the
      protocol to control the flow of the response data: a call to the
      transport's <code>pauseProducing</code> method will pause delivery; a
      later call to <code>resumeProducing</code> will resume it.  If it is
      decided that the rest of the response body is not desired,
      <code>stopProducing</code> can be used to stop delivery permanently;
      after this, the protocol's <code>connectionLost</code> method will be
      called.
    </p>

    <p>
      An important thing to keep in mind is that the body will only be read
      from the connection after <code>Response.deliverBody</code> is called.
      This also means that the connection will remain open until this is done
      (and the body read).  So, in general, any response with a body
      <em>must</em> have that body read using <code>deliverBody</code>.  If the
      application is not interested in the body, it should issue a
      <em>HEAD</em> request or use a protocol which immediately calls
      <code>stopProducing</code> on its transport.
    </p>

    <h3>HTTP over SSL</h3>

    <p>
      Everything you've read so far applies whether the scheme of the request
      URI is <em>HTTP</em> or <em>HTTPS</em>.  However, to control the SSL
      negotiation performed when an <em>HTTPS</em> URI is requested, there's
      one extra object to pay attention to: the SSL context factory.
    </p>

    <p>
      <code>Agent</code>'s constructor takes an optional second argument, a
      context factory.  This is an object like the context factory described
      in <a href="../../core/howto/ssl.xhtml">Using SSL in Twisted</a> but has
      one small difference.  The <code>getContext</code> method of this factory
      accepts the address from the URL being requested.  This allows it to
      return a context object which verifies that the server's certificate
      matches the URL being requested.
    </p>

    <p>
      Here's an example which shows how to use <code>Agent</code> to request
      an <em>HTTPS</em> URL with no certificate verification.
    </p>

    <pre class="python">
from twisted.python.log import err
from twisted.web.client import Agent
from twisted.internet import reactor
from twisted.internet.ssl import ClientContextFactory

class WebClientContextFactory(ClientContextFactory):
    def getContext(self, hostname, port):
        return ClientContextFactory.getContext(self)

def display(response):
    print "Received response"
    print response

def main():
    contextFactory = WebClientContextFactory()
    agent = Agent(reactor, contextFactory)
    d = agent.request("GET", "https://example.com/")
    d.addCallbacks(display, err)
    d.addCallback(lambda ignored: reactor.stop())
    reactor.run()

if __name__ == "__main__":
    main()
    </pre>

    <p>
      The important point to notice here is that <code>getContext</code> now
      accepts two arguments, a hostname and a port number.  These two arguments,
      a <code>str</code> and an <code>int</code>, give the address to which a
      connection is being established to request an HTTPS URL.  Because an agent
      might make multiple requests over a single connection,
      <code>getContext</code> may not be called once for each request.  A second
      or later request for a URL with the same hostname as a previous request
      may re-use an existing connection, and therefore will re-use the
      previously returned context object.
    </p>

    <p>
      To configure SSL options or enable certificate verification or hostname
      checking, provide a context factory which creates suitably configured
      context objects.
    </p>

    <h3>HTTP Persistent Connection</h3>

    <p>
      HTTP persistent connections use the same TCP connection to send and
      receive multiple HTTP requests/responses. This reduces latency and TCP
      connection establishment overhead.
    </p>

    <p>
      The constructor of <code class="API">twisted.web.client.Agent</code>
      takes an optional parameter pool, which should be an instance
      of <code class="API"
      base="twisted.web.client">HTTPConnectionPool</code>, which will be used
      to manage the connections.  If the pool is created with the
      parameter <code>persistent</code> set to <code>True</code> (the
      default), it will not close connections when the request is done, and
      instead hold them in its cache to be re-used.
    </p>

    <p>
      Here's an example which sends requests over a persistent connection:
    </p>

    <pre class="python">
from twisted.internet import reactor
from twisted.internet.defer import Deferred, DeferredList
from twisted.internet.protocol import Protocol
from twisted.web.client import Agent, HTTPConnectionPool

class IgnoreBody(Protocol):
    def __init__(self, deferred):
        self.deferred = deferred

    def dataReceived(self, bytes):
        pass

    def connectionLost(self, reason):
        self.deferred.callback(None)


def cbRequest(response):
    print 'Response code:', response.code
    finished = Deferred()
    response.deliverBody(IgnoreBody(finished))
    return finished

pool = HTTPConnectionPool(reactor)
agent = Agent(reactor, pool=pool)

def requestGet(url):
    d = agent.request('GET', url)
    d.addCallback(cbRequest)
    return d

# Two requests to the same host:
d = requestGet('http://localhost:8080/foo').addCallback(
    lambda ign: requestGet("http://localhost:8080/bar"))
def cbShutdown(ignored):
    reactor.stop()
d.addCallback(cbShutdown)

reactor.run()
    </pre>

    <p>
      Here, the two requests are to the same host, one after the each
      other. In most cases, the same connection will be used for the second
      request, instead of two different connections when using a
      non-persistent pool.
    </p>

    <h3>Multiple Connections to the Same Server</h3>

    <p>
      <code class="API">twisted.web.client.HTTPConnectionPool</code> instances
      have an attribute
      called <code class="python">maxPersistentPerHost</code> which limits the
      number of cached persistent connections to the same server. The default
      value is 2.  This is effective only when the <code class="API"
      base="twisted.web.client.HTTPConnectionPool">persistent</code> option is
      True. You can change the value like bellow:
    </p>

    <pre class="python">
from twisted.web.client import HTTPConnectionPool

pool = HTTPConnectionPool(reactor, persistent=True)
pool.maxPersistentPerHost = 1
    </pre>

    <p>
      With the default value of 2, the pool keeps around two connections to
      the same host at most. Eventually the cached persistent connections will
      be closed, by default after 240 seconds; you can change this timeout
      value with the <code class="python">cachedConnectionTimeout</code>
      attribute of the pool. To force all connections to close use
      the <code class="API"
      base="twisted.web.client.HTTPConnectionPool">closeCachedConnections</code>
      method.
    </p>


    <h4>Automatic Retries</h4>

    <p>If a request fails without getting a response, and the request is
    something that hopefully can be retried without having any side-effects
    (e.g. a request with method GET), it will be retried automatically when
    sending a request over a previously-cached persistent connection. You can
    disable this behavior by setting <code class="API"
    base="twisted.web.client.HTTPConnectionPool">retryAutomatically</code>
    to <code>False</code>. Note that each request will only be retried
    once.</p>


    <h3>Following redirects</h3>

    <p>
      By itself, <code>Agent</code> doesn't follow HTTP redirects (responses
      with 301, 302, 303, 307 status codes and a <code>location</code> header
      field). You need to use the <code
      class="API">twisted.web.client.RedirectAgent</code> class to do so. It
      implements a rather strict behavior of the RFC, meaning it will redirect
      301 and 302 as 307, only on <code>GET</code> and <code>HEAD</code>
      requests.
    </p>
    <p>
      The following example shows how to have a redirect-enabled agent.
    </p>

    <pre class="python">
from twisted.python.log import err
from twisted.web.client import Agent, RedirectAgent
from twisted.internet import reactor

def display(response):
    print "Received response"
    print response

def main():
    agent = RedirectAgent(Agent(reactor))
    d = agent.request("GET", "http://example.com/")
    d.addCallbacks(display, err)
    d.addCallback(lambda ignored: reactor.stop())
    reactor.run()

if __name__ == "__main__":
    main()
</pre>

    <h3>Using a HTTP proxy</h3>

    <p>
      To be able to use HTTP proxies with an agent, you can use the <code
      class="API">twisted.web.client.ProxyAgent</code> class. It supports the
      same interface as <code>Agent</code>, but takes the endpoint of the proxy
      as initializer argument.
    </p>

    <p>
      Here's an example demonstrating the use of an HTTP proxy running on
      localhost:8000.
    </p>

    <pre class="python">
from twisted.python.log import err
from twisted.web.client import ProxyAgent
from twisted.internet import reactor
from twisted.internet.endpoints import TCP4ClientEndpoint

def display(response):
    print "Received response"
    print response

def main():
    endpoint = TCP4ClientEndpoint(reactor, "localhost", 8000)
    agent = ProxyAgent(endpoint)
    d = agent.request("GET", "https://example.com/")
    d.addCallbacks(display, err)
    d.addCallback(lambda ignored: reactor.stop())
    reactor.run()

if __name__ == "__main__":
    main()
</pre>

    <p>
      Please refer to the <a
      href="../../core/howto/endpoints.xhtml">endpoints documentation</a> for
      more information about how they work and the <code
      class="API">twisted.internet.endpoints</code> API documentation to learn
      what other kinds of endpoints exist.
    </p>

    <h3>Handling HTTP cookies</h3>

    <p>
      An existing agent instance can be wrapped with
      <code class="API">twisted.web.client.CookieAgent</code> to automatically
      store, send and track HTTP cookies. A <code>CookieJar</code>
      instance, from the Python standard library module
      <a href="http://docs.python.org/library/cookielib.html">cookielib</a>, is
      used to store the cookie information. An example of using
      <code>CookieAgent</code> to perform a request and display the collected
      cookies might look like this:
    </p>

    <a href="listings/client/cookies.py" class="py-listing">
      Storing cookies with CookieAgent
    </a>

    <h3>Automatic Content Encoding Negotiation</h3>

    <p>
      <code class="API">twisted.web.client.ContentDecoderAgent</code> adds
      support for sending <em>Accept-Encoding</em> request headers and
      interpreting <em>Content-Encoding</em> response headers.  These headers
      allow the server to encode the response body somehow, typically with some
      compression scheme to save on transfer
      costs.  <code>ContentDecoderAgent</code> provides this functionality as a
      wrapper around an existing agent instance.  Together with one or more
      decoder objects (such as
      <code class="API">twisted.web.client.GzipDecoder</code>), this wrapper
      automatically negotiates an encoding to use and decodes the response body
      accordingly.  To application code using such an agent, there is no visible
      difference in the data delivered.
    </p>

    <a href="listings/client/gzipdecoder.py" class="py-listing">
      Requesting gzip encoded responses, with automatic decompression.
    </a>

    <p>
      Implementing support for new content encodings is as simple as writing a
      new class like <code>GzipDecoder</code> that can decode a response using
      the new encoding.  As there are not many content encodings in widespread
      use, gzip is the only encoding supported by Twisted itself.
    </p>

    <h2>
      Conclusion
    </h2>

    <p>
      You should now understand the basics of the Twisted Web HTTP client.  In
      particular, you should understand:
    </p>

    <ul>
      <li>
        How to issue requests with arbitrary methods, headers, and bodies.
      </li>
      <li>
        How to access the response version, code, phrase, headers, and body.
      </li>
      <li>
        How to store, send, and track cookies.
      </li>
      <li>
        How to control the streaming of the response body.
      </li>
      <li>
        How to enable the HTTP persistent connection, and control the
        number of connections.
      </li>
    </ul>
  </body>
</html>
